{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading and working with data in sktime\n",
    "\n",
    "Python provides a variety of useful ways to represent data, but NumPy arrays and pandas DataFrames are commonly used for data analysis. When using NumPy 2d-arrays or pandas DataFrames to analyze tabular data the rows are commony used to represent each instance (e.g. case or observation) of the data, while the columns are used to represent a given feature (e.g. variable or dimension) for an observation. Since timeseries data also has a time dimension for a given instance and feature, several alternative data formats could be used to represent this data, including nested pandas DataFrame structures, NumPy 3d-arrays, or multi-indexed pandas DataFrames. \n",
    "\n",
    "Sktime is designed to work with timeseries data stored as nested pandas DataFrame objects. Similar to working with pandas DataFrames with tabular data, this allows instances to be represented by rows and the feature data for each dimension of a problem (e.g. variables or features) to be stored in the DataFrame columns. To accomplish this the timepoints for each instance-feature combination are stored in a single cell in the input Pandas DataFrame ([see Sktime pandas DataFrame format](#sktime_df_format) for more details). \n",
    "\n",
    "Users can load or convert data into sktime's format in a variety of ways. Data can be loaded directly from a bespoke sktime file format (.ts) ([see Representing data with .ts files](#ts_files)) or supported file formats provided by [other existing data sources](#other_file_types) (such as Weka ARFF and .tsv). Sktime also provides functions to convert data to and from sktime's nested pandas DataFrame format and several other common ways for representing timeseries data using NumPy arrays or pandas DataFrames. [see Converting between sktime and alternative timeseries formats](#convert).\n",
    "\n",
    "The rest of this sktime tutorial will provide a more detailed description of the sktime pandas DataFrame format, a brief description of the .ts file format, how to load data from other supported formats, and how to convert between other common ways of representing timeseries data in NumPy arrays or pandas DataFrames.\n",
    "\n",
    "<a id=\"sktime_df_format\"></a>\n",
    "## Sktime pandas DataFrame format\n",
    "\n",
    "The core data structure for storing datasets in sktime is a _nested_ pandas DataFrame, where rows of the dataframe correspond to instances (cases or observations),  and columns correspond to dimensions of the problem (features or variables). The multiple timepoints and their corresponding values for each instance-feature pair are stored as pandas Series object _nested_ within the applicable DataFrame cell.\n",
    "\n",
    "For example, for a problem with n cases that each have data across c timeseries dimensions:\n",
    "\n",
    "    DataFrame:\n",
    "    index |   dim_0   |   dim_1   |    ...    |  dim_c-1\n",
    "       0  | pd.Series | pd.Series | pd.Series | pd.Series\n",
    "       1  | pd.Series | pd.Series | pd.Series | pd.Series\n",
    "      ... |    ...    |    ...    |    ...    |    ...\n",
    "       n  | pd.Series | pd.Series | pd.Series | pd.Series\n",
    "\n",
    "Representing timeseries data in this way makes it easy to align the timeseries features for a given instance with non-timeseries information. For example, in  a classification problem, it is easy to align the timeseries features for an observation with its (index-aligned) target class label:\n",
    "\n",
    "    index | class_val\n",
    "      0   |   int\n",
    "      1   |   int\n",
    "     ...  |   ...\n",
    "      n   |   int\n",
    "\n",
    "\n",
    "While sktime's format uses pandas Series objects in its nested DataFrame structure, other data structures like NumPy arrays could be used to hold the timeseries values in each cell. However, the use of pandas Series objects helps to facilitate simple storage of sparse data and make it easy to accomodate series with non-integer timestamps (such as dates). \n",
    "\n",
    "\n",
    "<a id=\"ts_files\"></a>\n",
    "## The .ts file format\n",
    "One common use case is to load locally stored data. To make this easy, the .ts file format has been created for representing problems in a standard format for use with sktime. \n",
    "\n",
    "### Representing data with .ts files\n",
    "A .ts file include two main parts:\n",
    "* header information\n",
    "* data\n",
    "\n",
    "The header information is used to facilitate simple representation of the data through including metadata about the structure of the problem. The header contains the following:\n",
    "\n",
    "    @problemName <problem name>\n",
    "    @timeStamps <true/false>\n",
    "    @univariate <true/false>\n",
    "    @classLabel <true/false> <space delimited list of possible class values>\n",
    "    @data\n",
    "\n",
    "The data for the problem should begin after the @data tag. In the simplest case where @timestamps is false, values for a series are expressed in a comma-separated list and the index of each value is relative to its position in the list (0, 1, ..., m). An _instance_ may contain 1 to many dimensions, where instances are line-delimited and dimensions within an instance are colon (:) delimited. For example:\n",
    "\n",
    "    2,3,2,4:4,3,2,2\n",
    "    13,12,32,12:22,23,12,32\n",
    "    4,4,5,4:3,2,3,2\n",
    "\n",
    "This example data has 3 _instances_, corresponding to the three lines shown above. Each instance has 2 _dimensions_ with 4 observations per dimension. For example, the intitial instance's first dimension has the timepoint values of 2, 3, 2, 4 and the second dimension has the values 4, 3, 2, 2.\n",
    "\n",
    "Missing readings can be specified using ?. For example, \n",
    "\n",
    "    2,?,2,4:4,3,2,2\n",
    "    13,12,32,12:22,23,12,32\n",
    "    4,4,5,4:3,2,3,2\n",
    "    \n",
    "would indicate the second timepoint value of the initial instance's first dimension is missing. \n",
    "\n",
    "Alternatively, for sparse datasets, readings can be specified by setting @timestamps to true in the header and representing the data with tuples in the form of (timestamp, value) just for the obser. For example, the first instance in the example above could be specified in this representation as:\n",
    "\n",
    "    (0,2),(1,3)(2,2)(3,4):(0,4),(1,3),(2,2),(3,2)\n",
    "\n",
    "Equivalently, the sparser example\n",
    "\n",
    "    2,5,?,?,?,?,?,5,?,?,?,?,4\n",
    "\n",
    "could be represented with just the non-missing timestamps as:\n",
    "\n",
    "    (0,2),(0,5),(7,5),(12,4)\n",
    "\n",
    "When using the .ts file format to store data for timeseries classification problems, the class label for an instance should be specified in the last dimension and @classLabel should be set to true in the header information and be followed by the set of possible class values. For example, if a case consists of a single dimension and has a class value of 1 it would be specified as:\n",
    "\n",
    "     1,4,23,34:1\n",
    "\n",
    "\n",
    "### Loading from .ts file to pandas DataFrame\n",
    "\n",
    "A dataset can be loaded from a .ts file using the following method in sktime.utils.data_io.py:\n",
    "\n",
    "    load_from_tsfile_to_dataframe(full_file_path_and_name, replace_missing_vals_with='NaN')\n",
    "\n",
    "This can be demonstrated using the Arrow Head problem that is included in sktime under sktime/datasets/data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.134330Z",
     "iopub.status.busy": "2020-12-19T14:32:13.133562Z",
     "iopub.status.idle": "2020-12-19T14:32:13.811083Z",
     "shell.execute_reply": "2020-12-19T14:32:13.811445Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import sktime\n",
    "from sktime.utils.data_io import load_from_tsfile_to_dataframe\n",
    "\n",
    "DATA_PATH = os.path.join(os.path.dirname(sktime.__file__), \"datasets/data\")\n",
    "\n",
    "train_x, train_y = load_from_tsfile_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.ts\")\n",
    ")\n",
    "test_x, test_y = load_from_tsfile_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TEST.ts\")\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Train and test partitions of the ArrowHead problem have been loaded into nested dataframes with an associated array of class values. As an example, below are the first 5 rows from the train_x and train_y:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.828436Z",
     "iopub.status.busy": "2020-12-19T14:32:13.823584Z",
     "iopub.status.idle": "2020-12-19T14:32:13.831026Z",
     "shell.execute_reply": "2020-12-19T14:32:13.831523Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     -1.9630\n",
       "1     -1.9578\n",
       "2     -1.9561\n",
       "3   ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     -1.7746\n",
       "1     -1.7740\n",
       "2     -1.7766\n",
       "3   ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     -1.8660\n",
       "1     -1.8420\n",
       "2     -1.8350\n",
       "3   ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     -2.0738\n",
       "1     -2.0733\n",
       "2     -2.0446\n",
       "3   ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     -1.7463\n",
       "1     -1.7413\n",
       "2     -1.7227\n",
       "3   ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0\n",
       "0  0     -1.9630\n",
       "1     -1.9578\n",
       "2     -1.9561\n",
       "3   ...\n",
       "1  0     -1.7746\n",
       "1     -1.7740\n",
       "2     -1.7766\n",
       "3   ...\n",
       "2  0     -1.8660\n",
       "1     -1.8420\n",
       "2     -1.8350\n",
       "3   ...\n",
       "3  0     -2.0738\n",
       "1     -2.0733\n",
       "2     -2.0446\n",
       "3   ...\n",
       "4  0     -1.7463\n",
       "1     -1.7413\n",
       "2     -1.7227\n",
       "3   ..."
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_x.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.834947Z",
     "iopub.status.busy": "2020-12-19T14:32:13.834437Z",
     "iopub.status.idle": "2020-12-19T14:32:13.836849Z",
     "shell.execute_reply": "2020-12-19T14:32:13.837412Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array(['0', '1', '2', '0', '1'], dtype='<U1')"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_y[0:5]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"other_file_types\"></a>\n",
    "## Loading other file formats\n",
    "Researchers who have made timeseries data available have used two other common formats, including:\n",
    "\n",
    "+ Weka ARFF files\n",
    "+ UCR .tsv files\n",
    "\n",
    "\n",
    "### Loading from Weka ARFF files\n",
    "\n",
    "It is also possible to load data from Weka's attribute-relation file format (ARFF) files. Data for timeseries problems are made available in this format by researchers at the University of East Anglia (among others) at www.timeseriesclassification.com. The `load_from_arff_to_dataframe` method in `sktime.utils.data_io` supports reading data for both univariate and multivariate timeseries problems. \n",
    "\n",
    "The univariate functionality is demonstrated below using data on the ArrowHead problem again (this time loading from ARFF file)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.840562Z",
     "iopub.status.busy": "2020-12-19T14:32:13.840050Z",
     "iopub.status.idle": "2020-12-19T14:32:13.869367Z",
     "shell.execute_reply": "2020-12-19T14:32:13.869937Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0\n",
       "0  0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...\n",
       "1  0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...\n",
       "2  0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...\n",
       "3  0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...\n",
       "4  0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274..."
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.utils.data_io import load_from_arff_to_dataframe\n",
    "\n",
    "X, y = load_from_arff_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.arff\")\n",
    ")\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The multivariate BasicMotions problem is used below to illustrate the ability to read multivariate timeseries data from ARFF files into the sktime format. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.873136Z",
     "iopub.status.busy": "2020-12-19T14:32:13.872669Z",
     "iopub.status.idle": "2020-12-19T14:32:13.946859Z",
     "shell.execute_reply": "2020-12-19T14:32:13.947377Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "      <th>dim_1</th>\n",
       "      <th>dim_2</th>\n",
       "      <th>dim_3</th>\n",
       "      <th>dim_4</th>\n",
       "      <th>dim_5</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     0.079106\n",
       "1     0.079106\n",
       "2    -0.903497\n",
       "3...</td>\n",
       "      <td>0     0.394032\n",
       "1     0.394032\n",
       "2    -3.666397\n",
       "3...</td>\n",
       "      <td>0     0.551444\n",
       "1     0.551444\n",
       "2    -0.282844\n",
       "3...</td>\n",
       "      <td>0     0.351565\n",
       "1     0.351565\n",
       "2    -0.095881\n",
       "3...</td>\n",
       "      <td>0     0.023970\n",
       "1     0.023970\n",
       "2    -0.319605\n",
       "3...</td>\n",
       "      <td>0     0.633883\n",
       "1     0.633883\n",
       "2     0.972131\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     0.377751\n",
       "1     0.377751\n",
       "2     2.952965\n",
       "3...</td>\n",
       "      <td>0    -0.610850\n",
       "1    -0.610850\n",
       "2     0.970717\n",
       "3...</td>\n",
       "      <td>0    -0.147376\n",
       "1    -0.147376\n",
       "2    -5.962515\n",
       "3...</td>\n",
       "      <td>0    -0.103872\n",
       "1    -0.103872\n",
       "2    -7.593275\n",
       "3...</td>\n",
       "      <td>0    -0.109198\n",
       "1    -0.109198\n",
       "2    -0.697804\n",
       "3...</td>\n",
       "      <td>0    -0.037287\n",
       "1    -0.037287\n",
       "2    -2.865789\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0    -0.813905\n",
       "1    -0.813905\n",
       "2    -0.424628\n",
       "3...</td>\n",
       "      <td>0     0.825666\n",
       "1     0.825666\n",
       "2    -1.305033\n",
       "3...</td>\n",
       "      <td>0     0.032712\n",
       "1     0.032712\n",
       "2     0.826170\n",
       "3...</td>\n",
       "      <td>0     0.021307\n",
       "1     0.021307\n",
       "2    -0.372872\n",
       "3...</td>\n",
       "      <td>0     0.122515\n",
       "1     0.122515\n",
       "2    -0.045277\n",
       "3...</td>\n",
       "      <td>0     0.775041\n",
       "1     0.775041\n",
       "2     0.383526\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     0.289855\n",
       "1     0.289855\n",
       "2    -0.669185\n",
       "3...</td>\n",
       "      <td>0     0.284130\n",
       "1     0.284130\n",
       "2    -0.210466\n",
       "3...</td>\n",
       "      <td>0     0.213680\n",
       "1     0.213680\n",
       "2     0.252267\n",
       "3...</td>\n",
       "      <td>0    -0.314278\n",
       "1    -0.314278\n",
       "2     0.018644\n",
       "3...</td>\n",
       "      <td>0     0.074574\n",
       "1     0.074574\n",
       "2     0.007990\n",
       "3...</td>\n",
       "      <td>0    -0.079901\n",
       "1    -0.079901\n",
       "2     0.237040\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0    -0.123238\n",
       "1    -0.123238\n",
       "2    -0.249547\n",
       "3...</td>\n",
       "      <td>0     0.379341\n",
       "1     0.379341\n",
       "2     0.541501\n",
       "3...</td>\n",
       "      <td>0    -0.286006\n",
       "1    -0.286006\n",
       "2     0.208420\n",
       "3...</td>\n",
       "      <td>0    -0.098545\n",
       "1    -0.098545\n",
       "2    -0.023970\n",
       "3...</td>\n",
       "      <td>0     0.058594\n",
       "1     0.058594\n",
       "2     0.175783\n",
       "3...</td>\n",
       "      <td>0    -0.074574\n",
       "1    -0.074574\n",
       "2     0.114525\n",
       "3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0  \\\n",
       "0  0     0.079106\n",
       "1     0.079106\n",
       "2    -0.903497\n",
       "3...   \n",
       "1  0     0.377751\n",
       "1     0.377751\n",
       "2     2.952965\n",
       "3...   \n",
       "2  0    -0.813905\n",
       "1    -0.813905\n",
       "2    -0.424628\n",
       "3...   \n",
       "3  0     0.289855\n",
       "1     0.289855\n",
       "2    -0.669185\n",
       "3...   \n",
       "4  0    -0.123238\n",
       "1    -0.123238\n",
       "2    -0.249547\n",
       "3...   \n",
       "\n",
       "                                               dim_1  \\\n",
       "0  0     0.394032\n",
       "1     0.394032\n",
       "2    -3.666397\n",
       "3...   \n",
       "1  0    -0.610850\n",
       "1    -0.610850\n",
       "2     0.970717\n",
       "3...   \n",
       "2  0     0.825666\n",
       "1     0.825666\n",
       "2    -1.305033\n",
       "3...   \n",
       "3  0     0.284130\n",
       "1     0.284130\n",
       "2    -0.210466\n",
       "3...   \n",
       "4  0     0.379341\n",
       "1     0.379341\n",
       "2     0.541501\n",
       "3...   \n",
       "\n",
       "                                               dim_2  \\\n",
       "0  0     0.551444\n",
       "1     0.551444\n",
       "2    -0.282844\n",
       "3...   \n",
       "1  0    -0.147376\n",
       "1    -0.147376\n",
       "2    -5.962515\n",
       "3...   \n",
       "2  0     0.032712\n",
       "1     0.032712\n",
       "2     0.826170\n",
       "3...   \n",
       "3  0     0.213680\n",
       "1     0.213680\n",
       "2     0.252267\n",
       "3...   \n",
       "4  0    -0.286006\n",
       "1    -0.286006\n",
       "2     0.208420\n",
       "3...   \n",
       "\n",
       "                                               dim_3  \\\n",
       "0  0     0.351565\n",
       "1     0.351565\n",
       "2    -0.095881\n",
       "3...   \n",
       "1  0    -0.103872\n",
       "1    -0.103872\n",
       "2    -7.593275\n",
       "3...   \n",
       "2  0     0.021307\n",
       "1     0.021307\n",
       "2    -0.372872\n",
       "3...   \n",
       "3  0    -0.314278\n",
       "1    -0.314278\n",
       "2     0.018644\n",
       "3...   \n",
       "4  0    -0.098545\n",
       "1    -0.098545\n",
       "2    -0.023970\n",
       "3...   \n",
       "\n",
       "                                               dim_4  \\\n",
       "0  0     0.023970\n",
       "1     0.023970\n",
       "2    -0.319605\n",
       "3...   \n",
       "1  0    -0.109198\n",
       "1    -0.109198\n",
       "2    -0.697804\n",
       "3...   \n",
       "2  0     0.122515\n",
       "1     0.122515\n",
       "2    -0.045277\n",
       "3...   \n",
       "3  0     0.074574\n",
       "1     0.074574\n",
       "2     0.007990\n",
       "3...   \n",
       "4  0     0.058594\n",
       "1     0.058594\n",
       "2     0.175783\n",
       "3...   \n",
       "\n",
       "                                               dim_5  \n",
       "0  0     0.633883\n",
       "1     0.633883\n",
       "2     0.972131\n",
       "3...  \n",
       "1  0    -0.037287\n",
       "1    -0.037287\n",
       "2    -2.865789\n",
       "3...  \n",
       "2  0     0.775041\n",
       "1     0.775041\n",
       "2     0.383526\n",
       "3...  \n",
       "3  0    -0.079901\n",
       "1    -0.079901\n",
       "2     0.237040\n",
       "3...  \n",
       "4  0    -0.074574\n",
       "1    -0.074574\n",
       "2     0.114525\n",
       "3...  "
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X, y = load_from_arff_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"BasicMotions/BasicMotions_TRAIN.arff\")\n",
    ")\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Loading from UCR .tsv Format Files\n",
    "\n",
    "A further option is to load data into sktime from tab separated value (.tsv) files. Researchers at the University of Riverside, California make a variety of timeseries data available in this format at https://www.cs.ucr.edu/~eamonn/time_series_data_2018. \n",
    "\n",
    "The `load_from_ucr_tsv_to_dataframe` method in `sktime.utils.data_io` supports reading  univariate problems. An example with ArrowHead is given below to demonstrate equivalence with loading from the .ts and ARFF file formats."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.958719Z",
     "iopub.status.busy": "2020-12-19T14:32:13.958207Z",
     "iopub.status.idle": "2020-12-19T14:32:13.991444Z",
     "shell.execute_reply": "2020-12-19T14:32:13.992003Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>dim_0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               dim_0\n",
       "0  0     -1.963009\n",
       "1     -1.957825\n",
       "2     -1.95614...\n",
       "1  0     -1.774571\n",
       "1     -1.774036\n",
       "2     -1.77658...\n",
       "2  0     -1.866021\n",
       "1     -1.841991\n",
       "2     -1.83502...\n",
       "3  0     -2.073758\n",
       "1     -2.073301\n",
       "2     -2.04460...\n",
       "4  0     -1.746255\n",
       "1     -1.741263\n",
       "2     -1.72274..."
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.utils.data_io import load_from_ucr_tsv_to_dataframe\n",
    "\n",
    "X, y = load_from_ucr_tsv_to_dataframe(\n",
    "    os.path.join(DATA_PATH, \"ArrowHead/ArrowHead_TRAIN.tsv\")\n",
    ")\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"convert\"></a>\n",
    "## Converting between other NumPy and pandas formats\n",
    "\n",
    "It is also possible to use data from sources other than .ts and .arff files by manually shaping the data into the format described above. \n",
    "\n",
    "Functions to convert from and to these types to sktime's nested DataFrame format are provided in `sktime.utils.data_processing`\n",
    "\n",
    "### Using tabular data with sktime\n",
    "\n",
    "One approach to representing timeseries data is a tabular DataFrame. As usual, each row represents an instance. In the tabular setting each timepoint of the univariate timeseries being measured for each instance are treated as feature and stored as a primitive data type in the DataFrame's cells. \n",
    "\n",
    "In a univariate setting, where there are `n` instances of the series and each univariate timeseries has `t` timepoints, this would yield a pandas DataFrame with shape (n, t). In practice, this could be used to represent sensors measuring the same signal over time (features) on different machines (instances) or the same economic variable over time (features) for different countries (instances). \n",
    "\n",
    "The function `from_2d_array_to_nested` converts a (n, t) tabular DataFrame to nested DataFrame with shape (n, 1). To convert from a nested DataFrame to a tabular array the function `from_nested_to_2d_array` can be used.\n",
    "\n",
    "The example below uses 50 instances with 20 timepoints each. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The tabular data has the shape (50, 20)\n"
     ]
    }
   ],
   "source": [
    "from numpy.random import default_rng\n",
    "\n",
    "from sktime.utils.data_processing import (\n",
    "    from_2d_array_to_nested,\n",
    "    from_nested_to_2d_array,\n",
    "    is_nested_dataframe,\n",
    ")\n",
    "\n",
    "rng = default_rng()\n",
    "X_2d = rng.standard_normal((50, 20))\n",
    "print(f\"The tabular data has the shape {X_2d.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `from_2d_array_to_nested` function makes it easy to convert this to a nested DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_nested is a nested DataFrame: True\n",
      "The cell contains a <class 'pandas.core.series.Series'>.\n",
      "The nested DataFrame has shape (50, 1)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>0</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     2.079005\n",
       "1     1.139698\n",
       "2     2.401339\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0    -1.221541\n",
       "1     0.319184\n",
       "2    -2.096536\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0    -2.136296\n",
       "1    -2.129254\n",
       "2     0.867093\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     0.866488\n",
       "1    -0.690724\n",
       "2     0.962626\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0    -0.991407\n",
       "1     1.694927\n",
       "2    -0.855941\n",
       "3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   0\n",
       "0  0     2.079005\n",
       "1     1.139698\n",
       "2     2.401339\n",
       "3...\n",
       "1  0    -1.221541\n",
       "1     0.319184\n",
       "2    -2.096536\n",
       "3...\n",
       "2  0    -2.136296\n",
       "1    -2.129254\n",
       "2     0.867093\n",
       "3...\n",
       "3  0     0.866488\n",
       "1    -0.690724\n",
       "2     0.962626\n",
       "3...\n",
       "4  0    -0.991407\n",
       "1     1.694927\n",
       "2    -0.855941\n",
       "3..."
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_nested = from_2d_array_to_nested(X_2d)\n",
    "print(f\"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}\")\n",
    "print(f\"The cell contains a {type(X_nested.iloc[0,0])}.\")\n",
    "print(f\"The nested DataFrame has shape {X_nested.shape}\")\n",
    "X_nested.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This nested DataFrame can also be converted back to a tabular DataFrame using easily. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The tabular data has the shape (50, 20)\n"
     ]
    }
   ],
   "source": [
    "X_2d = from_nested_to_2d_array(X_nested)\n",
    "print(f\"The tabular data has the shape {X_2d.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using long-format data with sktime\n",
    "\n",
    "Timeseries data can also be represented in _long_ format where each row identifies the value for a single timepoint for a given dimension for a given instance. \n",
    "\n",
    "This format may be encountered in a database where each row stores a single value measurement identified by several identification columns. For example, where `case_id` is an id to identify a specific instance in the data, `dimension_id` is an integer between 0 and d-1 for d dimensions in the data, `reading_id` is the index of timepoints for the associated `case_id` and `dimension_id`, and `value` is the actual value of the observation. E.g.:\n",
    "\n",
    "          | case_id | dim_id | reading_id | value\n",
    "     ------------------------------------------------\n",
    "       0  |   int   |  int   |    int     | double\n",
    "       1  |   int   |  int   |    int     | double\n",
    "       2  |   int   |  int   |    int     | double\n",
    "       3  |   int   |  int   |    int     | double\n",
    "       \n",
    "Sktime provides functions to convert to and from the long data format in `sktime.utils.data_processing`. \n",
    "\n",
    "The `from_long_to_nested` function converts from a long format DataFrame to sktime's nested format (with assumptions made on how the data is initially formatted). Conversely, `from_nested_to_long` converts from a sktime nested DataFrame into a long format DataFrame. \n",
    "\n",
    "\n",
    "To demonstrate this functionality the method below creates a dataset with a 50 instances (cases), 5 dimensions and 20 timepoints per dimension."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:13.998282Z",
     "iopub.status.busy": "2020-12-19T14:32:13.997756Z",
     "iopub.status.idle": "2020-12-19T14:32:14.000144Z",
     "shell.execute_reply": "2020-12-19T14:32:14.000992Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>case_id</th>\n",
       "      <th>dim_id</th>\n",
       "      <th>reading_id</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.465112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.807199</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>0.214023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>0.195609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>0.487599</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   case_id  dim_id  reading_id     value\n",
       "0        0       0           0  0.465112\n",
       "1        0       0           1  0.807199\n",
       "2        0       0           2  0.214023\n",
       "3        0       0           3  0.195609\n",
       "4        0       0           4  0.487599"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.utils.data_io import generate_example_long_table\n",
    "\n",
    "X = generate_example_long_table(num_cases=50, series_len=20, num_dims=5)\n",
    "\n",
    "X.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>case_id</th>\n",
       "      <th>dim_id</th>\n",
       "      <th>reading_id</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4995</th>\n",
       "      <td>49</td>\n",
       "      <td>4</td>\n",
       "      <td>15</td>\n",
       "      <td>0.093895</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4996</th>\n",
       "      <td>49</td>\n",
       "      <td>4</td>\n",
       "      <td>16</td>\n",
       "      <td>0.616636</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4997</th>\n",
       "      <td>49</td>\n",
       "      <td>4</td>\n",
       "      <td>17</td>\n",
       "      <td>0.785366</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4998</th>\n",
       "      <td>49</td>\n",
       "      <td>4</td>\n",
       "      <td>18</td>\n",
       "      <td>0.237264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4999</th>\n",
       "      <td>49</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "      <td>0.567829</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      case_id  dim_id  reading_id     value\n",
       "4995       49       4          15  0.093895\n",
       "4996       49       4          16  0.616636\n",
       "4997       49       4          17  0.785366\n",
       "4998       49       4          18  0.237264\n",
       "4999       49       4          19  0.567829"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As shown below, applying the `from_long_to_nested` method returns a sktime-formatted dataset with individual dimensions represented by columns of the output dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:14.115195Z",
     "iopub.status.busy": "2020-12-19T14:32:14.071800Z",
     "iopub.status.idle": "2020-12-19T14:32:14.522026Z",
     "shell.execute_reply": "2020-12-19T14:32:14.522679Z"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>var_0</th>\n",
       "      <th>var_1</th>\n",
       "      <th>var_2</th>\n",
       "      <th>var_3</th>\n",
       "      <th>var_4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     0.465112\n",
       "1     0.807199\n",
       "2     0.214023\n",
       "3...</td>\n",
       "      <td>0     0.255154\n",
       "1     0.243236\n",
       "2     0.745549\n",
       "3...</td>\n",
       "      <td>0     0.080859\n",
       "1     0.785212\n",
       "2     0.051891\n",
       "3...</td>\n",
       "      <td>0     0.441622\n",
       "1     0.711556\n",
       "2     0.541743\n",
       "3...</td>\n",
       "      <td>0     0.710037\n",
       "1     0.466908\n",
       "2     0.508830\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     0.955744\n",
       "1     0.683644\n",
       "2     0.665061\n",
       "3...</td>\n",
       "      <td>0     0.927143\n",
       "1     0.811811\n",
       "2     0.896187\n",
       "3...</td>\n",
       "      <td>0     0.403733\n",
       "1     0.086684\n",
       "2     0.029619\n",
       "3...</td>\n",
       "      <td>0     0.820763\n",
       "1     0.980972\n",
       "2     0.298770\n",
       "3...</td>\n",
       "      <td>0     0.073021\n",
       "1     0.709833\n",
       "2     0.139826\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     0.428306\n",
       "1     0.513439\n",
       "2     0.418690\n",
       "3...</td>\n",
       "      <td>0     0.657844\n",
       "1     0.307936\n",
       "2     0.072595\n",
       "3...</td>\n",
       "      <td>0     0.764560\n",
       "1     0.583594\n",
       "2     0.397959\n",
       "3...</td>\n",
       "      <td>0     0.197125\n",
       "1     0.186410\n",
       "2     0.380118\n",
       "3...</td>\n",
       "      <td>0     0.278130\n",
       "1     0.638953\n",
       "2     0.389453\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     0.384345\n",
       "1     0.010171\n",
       "2     0.590353\n",
       "3...</td>\n",
       "      <td>0     0.249394\n",
       "1     0.428338\n",
       "2     0.377216\n",
       "3...</td>\n",
       "      <td>0     0.595839\n",
       "1     0.319225\n",
       "2     0.275726\n",
       "3...</td>\n",
       "      <td>0     0.476314\n",
       "1     0.590130\n",
       "2     0.980705\n",
       "3...</td>\n",
       "      <td>0     0.225630\n",
       "1     0.393970\n",
       "2     0.690442\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     0.440827\n",
       "1     0.179104\n",
       "2     0.804322\n",
       "3...</td>\n",
       "      <td>0     0.461627\n",
       "1     0.200149\n",
       "2     0.122143\n",
       "3...</td>\n",
       "      <td>0     0.963287\n",
       "1     0.414079\n",
       "2     0.054978\n",
       "3...</td>\n",
       "      <td>0     0.532135\n",
       "1     0.310266\n",
       "2     0.670113\n",
       "3...</td>\n",
       "      <td>0     0.047700\n",
       "1     0.410984\n",
       "2     0.151532\n",
       "3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               var_0  \\\n",
       "0  0     0.465112\n",
       "1     0.807199\n",
       "2     0.214023\n",
       "3...   \n",
       "1  0     0.955744\n",
       "1     0.683644\n",
       "2     0.665061\n",
       "3...   \n",
       "2  0     0.428306\n",
       "1     0.513439\n",
       "2     0.418690\n",
       "3...   \n",
       "3  0     0.384345\n",
       "1     0.010171\n",
       "2     0.590353\n",
       "3...   \n",
       "4  0     0.440827\n",
       "1     0.179104\n",
       "2     0.804322\n",
       "3...   \n",
       "\n",
       "                                               var_1  \\\n",
       "0  0     0.255154\n",
       "1     0.243236\n",
       "2     0.745549\n",
       "3...   \n",
       "1  0     0.927143\n",
       "1     0.811811\n",
       "2     0.896187\n",
       "3...   \n",
       "2  0     0.657844\n",
       "1     0.307936\n",
       "2     0.072595\n",
       "3...   \n",
       "3  0     0.249394\n",
       "1     0.428338\n",
       "2     0.377216\n",
       "3...   \n",
       "4  0     0.461627\n",
       "1     0.200149\n",
       "2     0.122143\n",
       "3...   \n",
       "\n",
       "                                               var_2  \\\n",
       "0  0     0.080859\n",
       "1     0.785212\n",
       "2     0.051891\n",
       "3...   \n",
       "1  0     0.403733\n",
       "1     0.086684\n",
       "2     0.029619\n",
       "3...   \n",
       "2  0     0.764560\n",
       "1     0.583594\n",
       "2     0.397959\n",
       "3...   \n",
       "3  0     0.595839\n",
       "1     0.319225\n",
       "2     0.275726\n",
       "3...   \n",
       "4  0     0.963287\n",
       "1     0.414079\n",
       "2     0.054978\n",
       "3...   \n",
       "\n",
       "                                               var_3  \\\n",
       "0  0     0.441622\n",
       "1     0.711556\n",
       "2     0.541743\n",
       "3...   \n",
       "1  0     0.820763\n",
       "1     0.980972\n",
       "2     0.298770\n",
       "3...   \n",
       "2  0     0.197125\n",
       "1     0.186410\n",
       "2     0.380118\n",
       "3...   \n",
       "3  0     0.476314\n",
       "1     0.590130\n",
       "2     0.980705\n",
       "3...   \n",
       "4  0     0.532135\n",
       "1     0.310266\n",
       "2     0.670113\n",
       "3...   \n",
       "\n",
       "                                               var_4  \n",
       "0  0     0.710037\n",
       "1     0.466908\n",
       "2     0.508830\n",
       "3...  \n",
       "1  0     0.073021\n",
       "1     0.709833\n",
       "2     0.139826\n",
       "3...  \n",
       "2  0     0.278130\n",
       "1     0.638953\n",
       "2     0.389453\n",
       "3...  \n",
       "3  0     0.225630\n",
       "1     0.393970\n",
       "2     0.690442\n",
       "3...  \n",
       "4  0     0.047700\n",
       "1     0.410984\n",
       "2     0.151532\n",
       "3...  "
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.utils.data_processing import from_long_to_nested, from_nested_to_long\n",
    "\n",
    "X_nested = from_long_to_nested(X)\n",
    "X_nested.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As expected the result is a nested DataFrame and the cells include nested pandas Series objects. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2020-12-19T14:32:14.526778Z",
     "iopub.status.busy": "2020-12-19T14:32:14.526253Z",
     "iopub.status.idle": "2020-12-19T14:32:14.528291Z",
     "shell.execute_reply": "2020-12-19T14:32:14.528788Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_nested is a nested DataFrame: True\n",
      "The cell contains a <class 'pandas.core.series.Series'>.\n",
      "The nested DataFrame has shape (50, 5)\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "0    0.465112\n",
       "1    0.807199\n",
       "2    0.214023\n",
       "3    0.195609\n",
       "4    0.487599\n",
       "Name: 0, dtype: float64"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(f\"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}\")\n",
    "print(f\"The cell contains a {type(X_nested.iloc[0,0])}.\")\n",
    "print(f\"The nested DataFrame has shape {X_nested.shape}\")\n",
    "X_nested.iloc[0, 0].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As shown below, the `from_nested_to_long` function can be used to convert the resulting nested DataFrame (or any nested DataFrame) to a long format DataFrame. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>case_id</th>\n",
       "      <th>reading_id</th>\n",
       "      <th>dim_id</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>var_0</td>\n",
       "      <td>0.465112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>var_0</td>\n",
       "      <td>0.807199</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>var_0</td>\n",
       "      <td>0.214023</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>var_0</td>\n",
       "      <td>0.195609</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>var_0</td>\n",
       "      <td>0.487599</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   case_id  reading_id dim_id     value\n",
       "0        0           0  var_0  0.465112\n",
       "1        0           1  var_0  0.807199\n",
       "2        0           2  var_0  0.214023\n",
       "3        0           3  var_0  0.195609\n",
       "4        0           4  var_0  0.487599"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_long = from_nested_to_long(\n",
    "    X_nested,\n",
    "    instance_column_name=\"case_id\",\n",
    "    time_column_name=\"reading_id\",\n",
    "    dimension_column_name=\"dim_id\",\n",
    ")\n",
    "X_long.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>case_id</th>\n",
       "      <th>reading_id</th>\n",
       "      <th>dim_id</th>\n",
       "      <th>value</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4995</th>\n",
       "      <td>49</td>\n",
       "      <td>15</td>\n",
       "      <td>var_4</td>\n",
       "      <td>0.093895</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4996</th>\n",
       "      <td>49</td>\n",
       "      <td>16</td>\n",
       "      <td>var_4</td>\n",
       "      <td>0.616636</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4997</th>\n",
       "      <td>49</td>\n",
       "      <td>17</td>\n",
       "      <td>var_4</td>\n",
       "      <td>0.785366</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4998</th>\n",
       "      <td>49</td>\n",
       "      <td>18</td>\n",
       "      <td>var_4</td>\n",
       "      <td>0.237264</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4999</th>\n",
       "      <td>49</td>\n",
       "      <td>19</td>\n",
       "      <td>var_4</td>\n",
       "      <td>0.567829</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      case_id  reading_id dim_id     value\n",
       "4995       49          15  var_4  0.093895\n",
       "4996       49          16  var_4  0.616636\n",
       "4997       49          17  var_4  0.785366\n",
       "4998       49          18  var_4  0.237264\n",
       "4999       49          19  var_4  0.567829"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_long.tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using multi-indexed pandas DataFrames\n",
    "\n",
    "Pandas deprecated its Panel object in version 0.20.1. Since that time pandas has recommended representing 3-dimensional data using a multi-indexed DataFrame. \n",
    "\n",
    "Storing timeseries data in a Pandas multi-indexed DataFrame is a natural option since many timeseries problems include data over the instance, feature and time dimensions. \n",
    "\n",
    "Sktime provides the functions `from_multi_index_to_nested` and `from_nested_to_multi_index` in `sktime.utils.data_processing` to easily convert between pandas multi-indexed DataFrames and sktime's nested DataFrame structure. \n",
    "\n",
    "The example below illustrates how these functions can be used to convert to and from the nested structure given data with 50 instances, 5 features (columns) and 20 timepoints per feature. In the multi-indexed DataFrame a row represents a unique combination of the instance and timepoint indices. Therefore, the resulting multi-indexed DataFrame should have the shape (1000, 5). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The multi-indexed DataFrame has shape (1000, 5)\n",
      "The multi-index names are ['case_id', 'reading_id']\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>var_0</th>\n",
       "      <th>var_1</th>\n",
       "      <th>var_2</th>\n",
       "      <th>var_3</th>\n",
       "      <th>var_4</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>case_id</th>\n",
       "      <th>reading_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">0</th>\n",
       "      <th>0</th>\n",
       "      <td>0.190750</td>\n",
       "      <td>0.783922</td>\n",
       "      <td>0.720836</td>\n",
       "      <td>0.409951</td>\n",
       "      <td>0.265496</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.778653</td>\n",
       "      <td>0.094755</td>\n",
       "      <td>0.616626</td>\n",
       "      <td>0.454311</td>\n",
       "      <td>0.594953</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.271877</td>\n",
       "      <td>0.171994</td>\n",
       "      <td>0.807786</td>\n",
       "      <td>0.008497</td>\n",
       "      <td>0.074649</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.437475</td>\n",
       "      <td>0.725512</td>\n",
       "      <td>0.104991</td>\n",
       "      <td>0.128412</td>\n",
       "      <td>0.909077</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.560182</td>\n",
       "      <td>0.373475</td>\n",
       "      <td>0.375314</td>\n",
       "      <td>0.181087</td>\n",
       "      <td>0.001008</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       var_0     var_1     var_2     var_3     var_4\n",
       "case_id reading_id                                                  \n",
       "0       0           0.190750  0.783922  0.720836  0.409951  0.265496\n",
       "        1           0.778653  0.094755  0.616626  0.454311  0.594953\n",
       "        2           0.271877  0.171994  0.807786  0.008497  0.074649\n",
       "        3           0.437475  0.725512  0.104991  0.128412  0.909077\n",
       "        4           0.560182  0.373475  0.375314  0.181087  0.001008"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sktime.utils.data_io import make_multi_index_dataframe\n",
    "from sktime.utils.data_processing import (\n",
    "    from_multi_index_to_nested,\n",
    "    from_nested_to_multi_index,\n",
    ")\n",
    "\n",
    "X_mi = make_multi_index_dataframe(n_instances=50, n_columns=5, n_timepoints=20)\n",
    "\n",
    "print(f\"The multi-indexed DataFrame has shape {X_mi.shape}\")\n",
    "print(f\"The multi-index names are {X_mi.index.names}\")\n",
    "\n",
    "X_mi.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The multi-indexed DataFrame can be easily converted to a nested DataFrame with shape (50, 5). Note that the conversion to the nested DataFrame has preserved the column names (it has also preserved the values of the instance index and the pandas Series objects nested in each cell have preserved the time index). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_nested is a nested DataFrame: True\n",
      "The cell contains a <class 'pandas.core.series.Series'>.\n",
      "The nested DataFrame has shape (50, 5)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>var_0</th>\n",
       "      <th>var_1</th>\n",
       "      <th>var_2</th>\n",
       "      <th>var_3</th>\n",
       "      <th>var_4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     0.190750\n",
       "1     0.778653\n",
       "2     0.271877\n",
       "3...</td>\n",
       "      <td>0     0.783922\n",
       "1     0.094755\n",
       "2     0.171994\n",
       "3...</td>\n",
       "      <td>0     0.720836\n",
       "1     0.616626\n",
       "2     0.807786\n",
       "3...</td>\n",
       "      <td>0     0.409951\n",
       "1     0.454311\n",
       "2     0.008497\n",
       "3...</td>\n",
       "      <td>0     0.265496\n",
       "1     0.594953\n",
       "2     0.074649\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     0.164165\n",
       "1     0.243284\n",
       "2     0.518704\n",
       "3...</td>\n",
       "      <td>0     0.579280\n",
       "1     0.303707\n",
       "2     0.105113\n",
       "3...</td>\n",
       "      <td>0     0.301193\n",
       "1     0.736239\n",
       "2     0.064917\n",
       "3...</td>\n",
       "      <td>0     0.854033\n",
       "1     0.648855\n",
       "2     0.816810\n",
       "3...</td>\n",
       "      <td>0     0.099076\n",
       "1     0.882771\n",
       "2     0.470318\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     0.673390\n",
       "1     0.942796\n",
       "2     0.217408\n",
       "3...</td>\n",
       "      <td>0     0.165936\n",
       "1     0.213876\n",
       "2     0.693246\n",
       "3...</td>\n",
       "      <td>0     0.218683\n",
       "1     0.669671\n",
       "2     0.417209\n",
       "3...</td>\n",
       "      <td>0     0.540144\n",
       "1     0.973955\n",
       "2     0.940980\n",
       "3...</td>\n",
       "      <td>0     0.327064\n",
       "1     0.715518\n",
       "2     0.034418\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     0.285121\n",
       "1     0.949818\n",
       "2     0.337634\n",
       "3...</td>\n",
       "      <td>0     0.521327\n",
       "1     0.403082\n",
       "2     0.648529\n",
       "3...</td>\n",
       "      <td>0     0.284312\n",
       "1     0.361265\n",
       "2     0.162676\n",
       "3...</td>\n",
       "      <td>0     0.317464\n",
       "1     0.593125\n",
       "2     0.921290\n",
       "3...</td>\n",
       "      <td>0     0.513320\n",
       "1     0.028841\n",
       "2     0.697777\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     0.741211\n",
       "1     0.029021\n",
       "2     0.988874\n",
       "3...</td>\n",
       "      <td>0     0.424564\n",
       "1     0.750290\n",
       "2     0.954112\n",
       "3...</td>\n",
       "      <td>0     0.050796\n",
       "1     0.471665\n",
       "2     0.923734\n",
       "3...</td>\n",
       "      <td>0     0.378101\n",
       "1     0.174072\n",
       "2     0.130620\n",
       "3...</td>\n",
       "      <td>0     0.357425\n",
       "1     0.651974\n",
       "2     0.812803\n",
       "3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               var_0  \\\n",
       "0  0     0.190750\n",
       "1     0.778653\n",
       "2     0.271877\n",
       "3...   \n",
       "1  0     0.164165\n",
       "1     0.243284\n",
       "2     0.518704\n",
       "3...   \n",
       "2  0     0.673390\n",
       "1     0.942796\n",
       "2     0.217408\n",
       "3...   \n",
       "3  0     0.285121\n",
       "1     0.949818\n",
       "2     0.337634\n",
       "3...   \n",
       "4  0     0.741211\n",
       "1     0.029021\n",
       "2     0.988874\n",
       "3...   \n",
       "\n",
       "                                               var_1  \\\n",
       "0  0     0.783922\n",
       "1     0.094755\n",
       "2     0.171994\n",
       "3...   \n",
       "1  0     0.579280\n",
       "1     0.303707\n",
       "2     0.105113\n",
       "3...   \n",
       "2  0     0.165936\n",
       "1     0.213876\n",
       "2     0.693246\n",
       "3...   \n",
       "3  0     0.521327\n",
       "1     0.403082\n",
       "2     0.648529\n",
       "3...   \n",
       "4  0     0.424564\n",
       "1     0.750290\n",
       "2     0.954112\n",
       "3...   \n",
       "\n",
       "                                               var_2  \\\n",
       "0  0     0.720836\n",
       "1     0.616626\n",
       "2     0.807786\n",
       "3...   \n",
       "1  0     0.301193\n",
       "1     0.736239\n",
       "2     0.064917\n",
       "3...   \n",
       "2  0     0.218683\n",
       "1     0.669671\n",
       "2     0.417209\n",
       "3...   \n",
       "3  0     0.284312\n",
       "1     0.361265\n",
       "2     0.162676\n",
       "3...   \n",
       "4  0     0.050796\n",
       "1     0.471665\n",
       "2     0.923734\n",
       "3...   \n",
       "\n",
       "                                               var_3  \\\n",
       "0  0     0.409951\n",
       "1     0.454311\n",
       "2     0.008497\n",
       "3...   \n",
       "1  0     0.854033\n",
       "1     0.648855\n",
       "2     0.816810\n",
       "3...   \n",
       "2  0     0.540144\n",
       "1     0.973955\n",
       "2     0.940980\n",
       "3...   \n",
       "3  0     0.317464\n",
       "1     0.593125\n",
       "2     0.921290\n",
       "3...   \n",
       "4  0     0.378101\n",
       "1     0.174072\n",
       "2     0.130620\n",
       "3...   \n",
       "\n",
       "                                               var_4  \n",
       "0  0     0.265496\n",
       "1     0.594953\n",
       "2     0.074649\n",
       "3...  \n",
       "1  0     0.099076\n",
       "1     0.882771\n",
       "2     0.470318\n",
       "3...  \n",
       "2  0     0.327064\n",
       "1     0.715518\n",
       "2     0.034418\n",
       "3...  \n",
       "3  0     0.513320\n",
       "1     0.028841\n",
       "2     0.697777\n",
       "3...  \n",
       "4  0     0.357425\n",
       "1     0.651974\n",
       "2     0.812803\n",
       "3...  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_nested = from_multi_index_to_nested(X_mi, instance_index=\"case_id\")\n",
    "print(f\"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}\")\n",
    "print(f\"The cell contains a {type(X_nested.iloc[0,0])}.\")\n",
    "print(f\"The nested DataFrame has shape {X_nested.shape}\")\n",
    "X_nested.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nested DataFrames can also be converted to a multi-indexed Pandas DataFrame"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th>var_0</th>\n",
       "      <th>var_1</th>\n",
       "      <th>var_2</th>\n",
       "      <th>var_3</th>\n",
       "      <th>var_4</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>case_id</th>\n",
       "      <th>reading_id</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th rowspan=\"5\" valign=\"top\">0</th>\n",
       "      <th>0</th>\n",
       "      <td>0.190750</td>\n",
       "      <td>0.783922</td>\n",
       "      <td>0.720836</td>\n",
       "      <td>0.409951</td>\n",
       "      <td>0.265496</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.778653</td>\n",
       "      <td>0.094755</td>\n",
       "      <td>0.616626</td>\n",
       "      <td>0.454311</td>\n",
       "      <td>0.594953</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0.271877</td>\n",
       "      <td>0.171994</td>\n",
       "      <td>0.807786</td>\n",
       "      <td>0.008497</td>\n",
       "      <td>0.074649</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0.437475</td>\n",
       "      <td>0.725512</td>\n",
       "      <td>0.104991</td>\n",
       "      <td>0.128412</td>\n",
       "      <td>0.909077</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0.560182</td>\n",
       "      <td>0.373475</td>\n",
       "      <td>0.375314</td>\n",
       "      <td>0.181087</td>\n",
       "      <td>0.001008</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                       var_0     var_1     var_2     var_3     var_4\n",
       "case_id reading_id                                                  \n",
       "0       0           0.190750  0.783922  0.720836  0.409951  0.265496\n",
       "        1           0.778653  0.094755  0.616626  0.454311  0.594953\n",
       "        2           0.271877  0.171994  0.807786  0.008497  0.074649\n",
       "        3           0.437475  0.725512  0.104991  0.128412  0.909077\n",
       "        4           0.560182  0.373475  0.375314  0.181087  0.001008"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_mi = from_nested_to_multi_index(\n",
    "    X_nested, instance_index=\"case_id\", time_index=\"reading_id\"\n",
    ")\n",
    "X_mi.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Using NumPy 3d-arrays with sktime\n",
    "\n",
    "Another common approach for representing timeseries data is to use a 3-dimensional NumPy array with shape (n_instances, n_columns, n_timepoints). \n",
    "\n",
    "Sktime provides the functions `from_3d_numpy_to_nested` `from_nested_to_3d_numpy` in `sktime.utils.data_processing` to let users easily convert between NumPy 3d-arrays and nested pandas DataFrames. \n",
    "\n",
    "This is demonstrated using a 3d-array with 50 instances, 5 features (columns) and 20 timepoints, resulting in a 3d-array with shape (50, 5, 20). "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The 3d-array has shape (50, 5, 20)\n"
     ]
    }
   ],
   "source": [
    "from sktime.utils.data_processing import (\n",
    "    from_3d_numpy_to_nested,\n",
    "    from_multi_index_to_3d_numpy,\n",
    "    from_nested_to_3d_numpy,\n",
    ")\n",
    "\n",
    "X_mi = make_multi_index_dataframe(n_instances=50, n_columns=5, n_timepoints=20)\n",
    "X_3d = from_multi_index_to_3d_numpy(\n",
    "    X_mi, instance_index=\"case_id\", time_index=\"reading_id\"\n",
    ")\n",
    "\n",
    "print(f\"The 3d-array has shape {X_3d.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The 3d-array can be easily converted to a nested DataFrame with shape (50, 5). Note that since NumPy array doesn't have indices, the instance index is the numerical range over the number of instances and the columns are automatically assigned. Users can optionally supply their own columns names via the columns_names parameter. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "X_nested is a nested DataFrame: True\n",
      "The cell contains a <class 'pandas.core.series.Series'>.\n",
      "The nested DataFrame has shape (50, 5)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>var_0</th>\n",
       "      <th>var_1</th>\n",
       "      <th>var_2</th>\n",
       "      <th>var_3</th>\n",
       "      <th>var_4</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0     0.121508\n",
       "1     0.231054\n",
       "2     0.380812\n",
       "3...</td>\n",
       "      <td>0     0.172563\n",
       "1     0.843319\n",
       "2     0.867677\n",
       "3...</td>\n",
       "      <td>0     0.501176\n",
       "1     0.800671\n",
       "2     0.001438\n",
       "3...</td>\n",
       "      <td>0     0.680257\n",
       "1     0.189607\n",
       "2     0.238900\n",
       "3...</td>\n",
       "      <td>0     0.941062\n",
       "1     0.376775\n",
       "2     0.814203\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0     0.504065\n",
       "1     0.266924\n",
       "2     0.839628\n",
       "3...</td>\n",
       "      <td>0     0.931253\n",
       "1     0.947567\n",
       "2     0.743263\n",
       "3...</td>\n",
       "      <td>0     0.949248\n",
       "1     0.705591\n",
       "2     0.028871\n",
       "3...</td>\n",
       "      <td>0     0.901362\n",
       "1     0.759232\n",
       "2     0.580433\n",
       "3...</td>\n",
       "      <td>0     0.846453\n",
       "1     0.778602\n",
       "2     0.695229\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0     0.658493\n",
       "1     0.230315\n",
       "2     0.274096\n",
       "3...</td>\n",
       "      <td>0     0.727638\n",
       "1     0.099374\n",
       "2     0.170330\n",
       "3...</td>\n",
       "      <td>0     0.327489\n",
       "1     0.923184\n",
       "2     0.232839\n",
       "3...</td>\n",
       "      <td>0     0.641788\n",
       "1     0.498409\n",
       "2     0.028307\n",
       "3...</td>\n",
       "      <td>0     0.751264\n",
       "1     0.365060\n",
       "2     0.603343\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0     0.485202\n",
       "1     0.688087\n",
       "2     0.058469\n",
       "3...</td>\n",
       "      <td>0     0.941756\n",
       "1     0.511861\n",
       "2     0.269905\n",
       "3...</td>\n",
       "      <td>0     0.977654\n",
       "1     0.595739\n",
       "2     0.166681\n",
       "3...</td>\n",
       "      <td>0     0.600651\n",
       "1     0.469100\n",
       "2     0.917114\n",
       "3...</td>\n",
       "      <td>0     0.461481\n",
       "1     0.509542\n",
       "2     0.085183\n",
       "3...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0     0.867503\n",
       "1     0.832123\n",
       "2     0.273478\n",
       "3...</td>\n",
       "      <td>0     0.817830\n",
       "1     0.983938\n",
       "2     0.327779\n",
       "3...</td>\n",
       "      <td>0     0.336237\n",
       "1     0.715929\n",
       "2     0.489844\n",
       "3...</td>\n",
       "      <td>0     0.128253\n",
       "1     0.146269\n",
       "2     0.093848\n",
       "3...</td>\n",
       "      <td>0     0.097085\n",
       "1     0.348389\n",
       "2     0.560473\n",
       "3...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               var_0  \\\n",
       "0  0     0.121508\n",
       "1     0.231054\n",
       "2     0.380812\n",
       "3...   \n",
       "1  0     0.504065\n",
       "1     0.266924\n",
       "2     0.839628\n",
       "3...   \n",
       "2  0     0.658493\n",
       "1     0.230315\n",
       "2     0.274096\n",
       "3...   \n",
       "3  0     0.485202\n",
       "1     0.688087\n",
       "2     0.058469\n",
       "3...   \n",
       "4  0     0.867503\n",
       "1     0.832123\n",
       "2     0.273478\n",
       "3...   \n",
       "\n",
       "                                               var_1  \\\n",
       "0  0     0.172563\n",
       "1     0.843319\n",
       "2     0.867677\n",
       "3...   \n",
       "1  0     0.931253\n",
       "1     0.947567\n",
       "2     0.743263\n",
       "3...   \n",
       "2  0     0.727638\n",
       "1     0.099374\n",
       "2     0.170330\n",
       "3...   \n",
       "3  0     0.941756\n",
       "1     0.511861\n",
       "2     0.269905\n",
       "3...   \n",
       "4  0     0.817830\n",
       "1     0.983938\n",
       "2     0.327779\n",
       "3...   \n",
       "\n",
       "                                               var_2  \\\n",
       "0  0     0.501176\n",
       "1     0.800671\n",
       "2     0.001438\n",
       "3...   \n",
       "1  0     0.949248\n",
       "1     0.705591\n",
       "2     0.028871\n",
       "3...   \n",
       "2  0     0.327489\n",
       "1     0.923184\n",
       "2     0.232839\n",
       "3...   \n",
       "3  0     0.977654\n",
       "1     0.595739\n",
       "2     0.166681\n",
       "3...   \n",
       "4  0     0.336237\n",
       "1     0.715929\n",
       "2     0.489844\n",
       "3...   \n",
       "\n",
       "                                               var_3  \\\n",
       "0  0     0.680257\n",
       "1     0.189607\n",
       "2     0.238900\n",
       "3...   \n",
       "1  0     0.901362\n",
       "1     0.759232\n",
       "2     0.580433\n",
       "3...   \n",
       "2  0     0.641788\n",
       "1     0.498409\n",
       "2     0.028307\n",
       "3...   \n",
       "3  0     0.600651\n",
       "1     0.469100\n",
       "2     0.917114\n",
       "3...   \n",
       "4  0     0.128253\n",
       "1     0.146269\n",
       "2     0.093848\n",
       "3...   \n",
       "\n",
       "                                               var_4  \n",
       "0  0     0.941062\n",
       "1     0.376775\n",
       "2     0.814203\n",
       "3...  \n",
       "1  0     0.846453\n",
       "1     0.778602\n",
       "2     0.695229\n",
       "3...  \n",
       "2  0     0.751264\n",
       "1     0.365060\n",
       "2     0.603343\n",
       "3...  \n",
       "3  0     0.461481\n",
       "1     0.509542\n",
       "2     0.085183\n",
       "3...  \n",
       "4  0     0.097085\n",
       "1     0.348389\n",
       "2     0.560473\n",
       "3...  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_nested = from_3d_numpy_to_nested(X_3d)\n",
    "print(f\"X_nested is a nested DataFrame: {is_nested_dataframe(X_nested)}\")\n",
    "print(f\"The cell contains a {type(X_nested.iloc[0,0])}.\")\n",
    "print(f\"The nested DataFrame has shape {X_nested.shape}\")\n",
    "X_nested.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Nested DataFrames can also be converted to NumPy 3d-arrays. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The resulting object is a <class 'numpy.ndarray'>\n",
      "The shape of the 3d-array is (50, 5, 20)\n"
     ]
    }
   ],
   "source": [
    "X_3d = from_nested_to_3d_numpy(X_nested)\n",
    "print(f\"The resulting object is a {type(X_3d)}\")\n",
    "print(f\"The shape of the 3d-array is {X_3d.shape}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Converting between NumPy 3d-arrays and pandas multi-indexed DataFrame\n",
    "\n",
    "Although an example is not provided here, sktime lets users convert data between NumPy 3d-arrays and a multi-indexed pandas DataFrame formats using the functions `from_3d_numpy_to_multi_index` and `from_multi_index_to_3d_numpy` in `sktime.utils.data_processing`. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
